1) We want to show as much data as possible to see where the data gaps are and what we can do so far
2) for the items as "canvasdummyname" we have an exported csv with the IP adressses to try to match that
3) Then we can decide how to edit the xAPI process
import json
import pandas as pd
import plotly as plotly
import plotly.graph_objects as go
import plotly.express as px
from pandas import json_normalize
with open('Sample LRS rinnuja.json') as json_file:
xapiData = json.load(json_file)
# now lets clean up the data to pass into the data frame
# we want to iterate through and create one row per statement and columns for each nested feild
The goal is get the data frame to to go from this JSON format...
{
"actor": {
"name": "canvasdummyname",
"mbox": "mailto:canvasdummyemail@gmail.com"
},
"verb": {
"id": "http://adlnet.gov/expapi/verbs/answered",
"display": {
"en-US": "answered"
}
},
"object": {
"id": "https://elearn.ucr.edu/courses/3730",
"definition": {
"name": {
"en-US": " Week 1 Module 1: Moments"
},
"description": {
"en-US": "Student has answered slide 8.4"
},
"type": "http://id.tincanapi.com/activitytype/slide"
},
"objectType": "Activity"
},
"result": {
"response": "50",
"duration": "PT4S",
"score": {
"min": 0,
"max": 6,
"raw": 3,
"scaled": 0.5
},
"success": false
},
"id": "c5a83fe7-1927-4696-ad4b-57bb322ed06a",
"timestamp": "2021-06-18T07:29:40.802Z",
"stored": "2021-06-18T07:29:40.802Z",
"authority": {
"objectType": "Agent",
"account": {
"homePage": "https://xcite-testing.lrs.io/keys/authorization",
"name": "authorization"
}
}
}
To something like this below instead of list of complex JSON statements.
| df | name (string) | verb id (string) | object id (string) | object descption (string) | result duration (string) | result responce (string) | score max (int) | score raw (int) | score scaled (double/float) | sucesss (boolean) | time stamp (string) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Nicole Garcia | http://id.tincanapi.com/verb/viewed | https://elearn.ucr.edu/courses/3730 | Student has viewed video: Truss reaction Forces | PT15S | responce here | 6 | 4 | .66 | false | 2021-06-18T07:29:40.802Z |
#https://www.delftstack.com/howto/python-pandas/json-to-pandas-dataframe/
df = json_normalize(xapiData)
df.head()
| id | timestamp | stored | actor.name | actor.mbox | verb.id | verb.display.en-US | object.id | object.definition.name.en-US | object.definition.description.en-US | ... | result.duration | authority.objectType | authority.account.homePage | authority.account.name | result.response | result.score.min | result.score.max | result.score.raw | result.score.scaled | result.success | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4e66ecaa-48f6-4b14-b948-f15068983b33 | 2021-06-15T02:14:02.632Z | 2021-06-15T02:14:02.632Z | canvas test name | mailto:canvastestnamehere@gmail.com | http://adlnet.gov/expapi/verbs/initialized | initialized | https://elearn.ucr.edu/courses/3730 | Cal Labs Module 1: Method of Joints | Student has started module 1: Method of joints | ... | PT2S | Agent | https://xcite-testing.lrs.io/keys/authorization | authorization | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | f15f13c6-144d-4b39-b066-1b37fb39786c | 2021-06-15T03:02:02.812Z | 2021-06-15T03:02:02.812Z | Raghav | mailto:2001guptar@gmail.com | http://adlnet.gov/expapi/verbs//viewed | viewed | tag:adlnet.gov,2013:expapi:0.9:activities:slide | Student has viewed this ID | NaN | ... | NaN | Agent | https://xcite-testing.lrs.io/keys/authorization | authorization | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 6c773444-d8a9-442e-a191-b44d62746d71 | 2021-06-15T18:45:25.673Z | 2021-06-15T18:45:25.673Z | canvas test name | mailto:canvastestnamehere@gmail.com | http://id.tincanapi.com/verb/viewed | viewed | https://elearn.ucr.edu/courses/3730 | Cal Labs Module 1: Method of joints | Student has viewed video: Truss reaction Forces | ... | PT8M28S | Agent | https://xcite-testing.lrs.io/keys/authorization | authorization | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | c7f0f029-063c-428e-b538-1d9e3d3db56a | 2021-06-16T06:40:39.840Z | 2021-06-16T06:40:39.840Z | canvas test name | mailto:canvastestnamehere@gmail.com | http://adlnet.gov/expapi/verbs/initialized | initialized | https://elearn.ucr.edu/courses/3730 | Cal Labs Module 1: Method of Joints | Student has started module 1: Method of joints | ... | PT3S | Agent | https://xcite-testing.lrs.io/keys/authorization | authorization | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | ec2683f8-f7e5-472e-9895-c8f090a1a150 | 2021-06-16T06:40:40.923Z | 2021-06-16T06:40:40.923Z | canvas test name | mailto:canvastestnamehere@gmail.com | http://id.tincanapi.com/verb/viewed | viewed | https://elearn.ucr.edu/courses/3730 | Cal Labs Module 1: Method of joints | Student has viewed video: Truss reaction Forces | ... | PT1S | Agent | https://xcite-testing.lrs.io/keys/authorization | authorization | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 22 columns
1) to decipher which statements came from the webpages or Storyline, looks like it records which access key was used. For right now we both used the same keys so we can use it.
2) df is 21,343 statements with 22 diffrent columns. Matches the LRS records.
3) data is too wide
4) verb id and verb desc are eseentially the same thing we only need one. There are other similar cases
5) based on the method to grab username via the canvas LMS, i'd be more confident using actor.name than actor.mbox
( i'm not sure if the canvas string chnages per session, though I also think it's possible to change your name through canvas too so that could be an issue for another time )
trimmed_df = df.copy() #to not change orginal df
# below feilds are all the same, english version of other feilds, or too general to be usefull right now. These feilds can be edited in revisons to xAPI
# del trimmed_df['id'] actually keep this, we can use this with the ip CSV
del trimmed_df['object.id']
del trimmed_df['object.objectType']
del trimmed_df['authority.objectType']
del trimmed_df['authority.account.homePage']
del trimmed_df['authority.account.name']
del trimmed_df['verb.id']
del trimmed_df['stored']
del trimmed_df['actor.mbox']
del trimmed_df['object.definition.type']
trimmed_df.head()
| id | timestamp | actor.name | verb.display.en-US | object.definition.name.en-US | object.definition.description.en-US | result.duration | result.response | result.score.min | result.score.max | result.score.raw | result.score.scaled | result.success | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4e66ecaa-48f6-4b14-b948-f15068983b33 | 2021-06-15T02:14:02.632Z | canvas test name | initialized | Cal Labs Module 1: Method of Joints | Student has started module 1: Method of joints | PT2S | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | f15f13c6-144d-4b39-b066-1b37fb39786c | 2021-06-15T03:02:02.812Z | Raghav | viewed | Student has viewed this ID | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 6c773444-d8a9-442e-a191-b44d62746d71 | 2021-06-15T18:45:25.673Z | canvas test name | viewed | Cal Labs Module 1: Method of joints | Student has viewed video: Truss reaction Forces | PT8M28S | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | c7f0f029-063c-428e-b538-1d9e3d3db56a | 2021-06-16T06:40:39.840Z | canvas test name | initialized | Cal Labs Module 1: Method of Joints | Student has started module 1: Method of joints | PT3S | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | ec2683f8-f7e5-472e-9895-c8f090a1a150 | 2021-06-16T06:40:40.923Z | canvas test name | viewed | Cal Labs Module 1: Method of joints | Student has viewed video: Truss reaction Forces | PT1S | NaN | NaN | NaN | NaN | NaN | NaN |
# wide data format
base_df = trimmed_df.copy()
fig = px.box(base_df, x='actor.name', y='result.score.raw')
fig.update_layout(title = "box plot, actor vs result score raw (entire course)")
fig.show()
# lets get a blox plot for each module
# https://stackoverflow.com/questions/17071871/how-do-i-select-rows-from-a-dataframe-based-on-column-values
temp = base_df.loc[base_df['object.definition.name.en-US'] == "Week 5 Module 12: Friction"].copy()
fig = px.box(temp, x='actor.name', y = 'result.score.raw' )
fig.update_layout(title = "box plot, actor vs result score raw (Week 5 Module 12: Friction)")
fig.show()
fig = px.box(temp, x='actor.name', y = 'result.score.scaled' )
fig.update_layout(title = "box plot, actor vs result score scaled (Week 5 Module 12: Friction)")
fig.show()
temp = base_df.loc[(base_df['object.definition.name.en-US'] == "Week 5 Module 12: Friction") & (base_df['object.definition.description.en-US'] == "Student has answered slide 3.2")].copy()
temp = temp.sort_values(by=['result.duration'])
fig = px.line(temp, x='result.duration', y = 'result.score.raw' )
fig.update_layout(title = "Duration vs result score raw (Week 5 Module 12: Friction) \"Student has answered slide 3.2\" ")
fig.show()
temp = base_df.loc[(base_df['object.definition.name.en-US'] == "Week 5 Module 12: Friction") & (base_df['object.definition.description.en-US'] == "Student has answered slide 3.3")].copy()
temp = temp.sort_values(by=['result.duration'])
fig = px.box(temp, x='result.duration', y = 'result.score.raw' )
fig.update_layout(title = "Duration vs result score raw (Week 5 Module 12: Friction) \"Student has answered slide 3.3\" ")
fig.show()
temp = base_df.loc[(base_df['object.definition.name.en-US'] == "Week 5 Module 12: Friction") & (base_df['object.definition.description.en-US'] == "Student has answered slide 3.3")].copy()
temp = temp.sort_values(by=['result.response'])
fig = px.pie(temp, values='result.score.raw', names ='result.response' )
fig.update_layout(title = "response vs result score raw (Week 5 Module 12: Friction) \"Student has answered slide 3.3\" ")
fig.show()
temp = base_df.loc[(base_df['object.definition.name.en-US'] == "Week 5 Module 12: Friction") & (base_df['object.definition.description.en-US'] == "Student has answered slide 3.3")].copy()
temp = temp.sort_values(by=['result.duration'])
fig = px.scatter(temp, x='result.duration', y ='result.response' )
fig.update_layout(title = "response vs result score raw (Week 5 Module 12: Friction) \"Student has answered slide 3.3\" ")
fig.show()
# duration would be easier to read if taken out of ISO format
TODO: Parse ISO 8601 duration format into seconds to make it more sortable
temp = base_df.loc[(base_df['object.definition.name.en-US'] == "Week 5 Module 12: Friction") & (base_df['object.definition.description.en-US'] == "Student has answered slide 3.3") & (base_df['actor.name']== ' Evan Carl Renck') ].copy()
# due to error in split function, there is a space in front of all names
temp = temp.sort_values(by=['result.response'])
fig = px.pie(temp, values='result.score.raw', names ='result.response' )
fig.update_layout(title = "response vs result score raw for Evan Carl Renck (Week 5 Module 12: Friction) \"Student has answered slide 3.3\" ")
fig.show()
#run this below to export as html and keep graph interactivity
plotly.offline.init_notebook_mode()